In Proceedings of the Second Intl Workshop on Services in Distributed and Networked Environments SDNE Application Level Document Caching in the Internet
نویسندگان
چکیده
With the increasing demand for document transfer ser vices such as the World Wide Web comes a need for better resource management to reduce the latency of documents in these systems To address this need we analyze the potential for document caching at the application level in document transfer services We have collected traces of actual executions of Mosaic re ecting over half a million user requests for WWW documents Using those traces we study the tradeo s between caching at three levels in the system and the potential for use of application level information in the caching system Our traces show that while a high hit rate in terms of URLs is achievable a much lower hit rate is possible in terms of bytes because most pro tably cached documents are small We consider the performance of caching when applied at the level of in dividual user sessions at the level of individual hosts and at the level of a collection of hosts on a single LAN We show that the performance gain achievable by caching at the session level which is straightforward to implement is nearly all of that achievable at the LAN level where caching is more di cult to implement However when resource requirements are considered LAN level caching becomes much more desirable since it can achieve a given level of caching performance using a much smaller amount of cache space Finally we consider the use of organiza tional boundary information as an example of the potential for use of application level information in caching Our results suggest that distinguishing between documents pro duced locally and those produced remotely can provide use ful leverage in designing caching policies because of di er ences in the potential for sharing these two document types among multiple users Introduction Some of the most popular services currently pro vided by the Internet are the distributed information systems such as the World Wide Web WWW the Anonymous FTP transfer system the Wide Area In formation System WAIS and the Gopher system These services are characterized by a many to many This work has been partially supported by NSF grant CCR pattern of le transfer most hosts in the system are potentially capable of serving les as well as re questing them We refer to these systems as document transfer systems and to the les involved as documents since each le has essentially been electronically pub lished An increasingly large fraction of available band width on the Internet is being used to transfer doc uments Strategies for reducing the latency of doc ument access the network bandwidth demand of doc ument transfers and the demand on document servers are becoming increasingly important Techniques that could reduce document latency network bandwidth demand and server demand include data caching and replication However in contrast to most distributed le systems document transfer services usually incor porate simple caching strategies if any and do not typically provide location transparency While techniques based on distributed le systems could be used to improve signi cantly the performance of document transfer systems there are a number of advantages to considering caching and replication at the application level rather than at the lesystem level First application level caching does not require all users to agree on a common lesystem it enables heterogeneous systems to participate easily Second and more important application level caching allows cache strategies to make use of the higher semantic content available at the application level to exploit such information as document type user pro le user past history document content and organizational boundaries This paper describes initial investigations into application level strategies for document caching and replication on wide area networks While we are in general concerned with all three aspects of the prob lem document latency network demand and server demand we focus in this paper on minimizing docu ment latency as our primary goal As a result we con centrate on caching strategies rather than document replication which is mainly a technique for reducing server load We employed a trace driven simulation approach to studying the document caching problem First we collected logs of users accessing the World Wide Web We instrumented a version of NCSA Mosaic to keep a record of all documents named by their UniformRe source Locators URLs accessed by the user during an execution of Mosaic We refer to each execution of Mosaic as a session and we call the log of each session a trace The results in this paper are based on traces Next we used the traces as input to an event driven simulation that determined how various caching strategies and cache sizes a ected the performance of the system The simulation outputs a set of statistics that describes the e ectiveness of caching in terms of bytes transferred and document latency This paper discusses cache policies that operate at three levels the session level in which caches for separate sessions are managed independently the host level in which caches for separate hosts are man aged independently and the LAN level in which caches for separate LANs are managed independently Session caches are similar to the policies used in current versions of NCSA Mosaic Host caches consist of a single host s bu ers allocated to document caching that persist across invocations of the client Host caches could be implemented by a local server or by periodically synchronizing each application s memory based cache with a disk bu er LAN caches consist of a cache managed by the clients on a single LAN as in LAN caches require cooperation among the par ticipating clients host and session caches do not Our work is unique in a number of ways First we base it on the large amount of user trace data we have collected Second we consider caching policies that can be implemented without client cooperation as well as policies that require client cooperation Finally we use application level information in analyzing our trace data and in formulating cache policies Our results show that caching strategies that are nearly as e ective as a cooperative strategy can be implemented at the application level without cooper ation in fact session level strategies yield nearly all the gains of host level and LAN level strategies In addition while session level caching is nearly as ef fective as the others it consumes much more system resources For a given level of performance less sys tem resources are consumed by host level caching and even less are consumed by LAN level caching Thus if a xed amount of system resources is to be allo cated to caching they are best allocated to LAN level caching Finally our data suggest that the use of application level information can signi cantly improve some aspects of system performance in particular identifying documents that originate outside of the lo cal organizational boundary in our case the Boston University community is useful in understanding and tuning cache performance We discuss cache policies that favor or discourage retention of local documents We show that documents originating outside the local organization show markedly di erent sharing patterns from those that are served locally The remainder of the paper consists of rst a de scription of our trace data and the collection process next the results of our simulations for various caching policies using that data next a comparison of our Sessions Users Documents Requested Unique Documents Requested Bytes Requested MB Unique Bytes Requested MB Table Summary Statistics of Trace Data work with related research and nally our conclu sions
منابع مشابه
Improve Replica Placement in Content Distribution Networks with Hybrid Technique
The increased using of the Internet and its accelerated growth leads to reduced network bandwidth and the capacity of servers; therefore, the quality of Internet services is unacceptable for users while the efficient and effective delivery of content on the web has an important role to play in improving performance. Content distribution networks were introduced to address this issue. Replicatin...
متن کاملA Novel Caching Strategy in Video-on-Demand (VoD) Peer-to-Peer (P2P) Networks Based on Complex Network Theory
The popularity of video-on-demand (VoD) streaming has grown dramatically over the World Wide Web. Most users in VoD P2P networks have to wait a long time in order to access their requesting videos. Therefore, reducing waiting time to access videos is the main challenge for VoD P2P networks. In this paper, we propose a novel algorithm for caching video based on peers' priority and video's popula...
متن کاملA Novel Caching Strategy in Video-on-Demand (VoD) Peer-to-Peer (P2P) Networks Based on Complex Network Theory
The popularity of video-on-demand (VoD) streaming has grown dramatically over the World Wide Web. Most users in VoD P2P networks have to wait a long time in order to access their requesting videos. Therefore, reducing waiting time to access videos is the main challenge for VoD P2P networks. In this paper, we propose a novel algorithm for caching video based on peers' priority and video's popula...
متن کاملDirect Marketing Based on Fuzzy Clustering of Customers (Case Study: on one Mobile Company)
Objective There is a general tendency toward direct marketing these days. Therefore, instead of designing advertisement and marketing strategies for all the customers in the market, it is recommended to classify the customers based on clustering techniques and then design specific strategies accordingly. This will reduce marketing and advertisement expenses, increase sale department efficientl...
متن کاملS a I S T Bo S T O N Application-level Document Caching in the Internet
With the increasing demand for document transfer services such as the World Wide Web comes a need for better resource management to reduce the latency of documents in these systems. To address this need, we analyze the potential for document caching at the application level in document transfer services. We have collected traces of actual executions of Mosaic, re ecting over half a million user...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995